# set global knit options
knitr::opts_chunk$set(echo = T, # print chunks?
eval = T, # run chunks?
error = F, # print errors?
warning = F, # print warnings?
message = F, # print messages?
cache = T # cache?; be careful with this!
)
# suppress scientific notation
options(scipen=999)
# play a sound if error encountered
options(error = function() {beepr::beep(9)})
# load packages
## create list of package names
packages <- c( #"SIN", # this package was removed from the CRAN repository
"MASS", "dplyr", "tidyr", "purrr", "extraDistr", "ggplot2", "loo", "bridgesampling", "brms", "bayesplot", "tictoc", "hypr", "bcogsci", "papaja", "grid", "kableExtra", "gridExtra", "lme4", "cowplot", "pdftools", "cmdstanr", "rootSolve", "rstan"
)
# NB: if you haven't already installed bcogsci through devtools, it won't be loaded
## Now load or install & load all
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
# this is also required, taken from the textbook
## Save compiled models:
rstan_options(auto_write = FALSE)
## Parallelize the chains using all the cores:
options(mc.cores = parallel::detectCores())
# To solve some conflicts between packages
select <- dplyr::select
extract <- rstan::extract2 Chapter notes
These notes are based on (nicenboim_introduction_nodate?), both from a PDF version supplied by the authors and the html version available here (accessed in early 2023). Much of the notes are taken verbatim from the book, as are code snippets.
Set up
3 Ch. 1 - Intro
- given some data, how to use Bayes’ theorem to quantify uncertainty about our belief regarding a scientific question of interest
- topics to be understood:
- the basic concepts behind probability
- the concept of random variables
- probability distributions
- the concept of likelihood
3.1 Probability
Frequency-based versus uncertain-belief perspective of probability:
- repeatable events, like rolling a die and getting a 6, are frequentist because probability is related to the frequency at which we’d observe an outcome given repeated observations
- one-of-a-kind events, like earthquakes, don’t work with this idea of probability
- the probability of an earthquake expresses our uncertainty about an event happening
- we also be uncertain about how probable an event is: being 90% sure something is 50% likely to happen
- this is what we’re interested in: how uncertain we are of an estimate
In Bayesian analysis, we want to express our uncertainty about the probability of observing an outcome (prior distribution).
3.1.1 Conditional probability and Bayes’ rule
- A = “the streets are wet”
- B = “it was raining”
- P(A|B) = the probability of A given B
- P(A,B) = P(A|B)P(B) (the probability of A and B happening)
3.1.2 Law of total probability
- dunno
3.2 Discrete random variables
Generating random sequences of simulated data with a binomial distribution. Imagine a cloze task, where we consider a particular word a success (1) and any other word a failure (0). If we run the experiment 20 times with a sample size of 10, the cloze probabilities for these 20 experiments would be:
rbinom(10, n = 20, prob = .5)[1] 7 7 7 5 6 7 6 4 5 3 8 2 2 5 4 1 6 6 3 7
For discrete random variables such as the binomial, the probability distribution p(y|\(\theta\)) is called a probability mass function (PMF) . The PMF defines the probability of each possible outcome. With n = 10 trials, there are 11 possible outcomes (0, 1, 2,…10 succeses). Which outcome is most probable depends on the parameter \(\theta\) that represents the probability of success. Above, we set \(\theta\) to 0.5.
3.2.1 The mean and variance of the binomial distribution
In real exerimental situations we never know the true value of \(\theta\) (probability of an outcome), but it can be derived from the data: \(\theta\) hat = k/n, where k = number of observed successess, n = number of trials, and \(\theta\) hat = observed proportion of successes. \(\theta\) hat = maximum likelihood estimate of the true but unknown parameter \(\theta\). Basically, the mean of the binomial distribution. The variance can also be estimated by computing (n(\(\theta\)))(1 - \(\theta\)). These estimates can be be used for statistical inference.
3.2.2 Compute probability of a particular outcome (discrete): dibinom
dbinom calculates probability of k successes out of n given a particular \(\theta\).
dbinom(5, size = 10, prob = .5)[1] 0.2460938
dbinom(5, size = 10, prob = .1)[1] 0.001488035
dbinom(5, size = 10, prob = .9)[1] 0.001488035
With continuous data, the probability of obtaining an exact value will always be zero. We’ll come ot this later.
3.2.3 Compute cumulative probability: pbinom
The cumulative distribution function (CDF): essentially the sum of all probabilities of the values of k you are interested in. E.g., the probability of observing 2 successes or fewer (0, 1, or 2) is:
# sum of probabilities for exact k's
dbinom(0, size = 10, prob = .5) +
dbinom(1, size = 10, prob = .5) +
dbinom(2, size = 10, prob = .5)[1] 0.0546875
# or
sum(dbinom(0:2, size = 10, prob = .5))[1] 0.0546875
# or use pbinom()
pbinom(2, size = 10, prob = 0.5, lower.tail = TRUE)[1] 0.0546875
# conversely, what is the $\theta$ of observing THREE successes or more?
pbinom(2, size = 10, prob = 0.5, lower.tail = F)[1] 0.9453125
# or
sum(dbinom(3:10, size = 10, prob = .5))[1] 0.9453125
# the probability of observing 10 or fewer successes (out of 10 trials)
pbinom(10, size = 10, prob = 0.5, lower.tail = TRUE)[1] 1
3.2.4 Compute the inverse of the CDF (quantile function): qbinom
The quantile function (the inverse CDF) obtains the value of k (the quantile) given the probability of obtaining k or less than k successes given some specific probability value p:
# reverse of dbinom(2,10,.5) would be:
qbinom(0.0546875, size=10, prob=.5)[1] 2
3.2.4.1 Generage simulated data from binomial distribtion: rbinom
# given 1 iteration of 10 trials where p = .5, produce a random value of k
rbinom(1, 10, .5)[1] 6
3.3 Continuous random variables
Imagine vector of reading times data with a normal distribution, defined by its mean and its sd. The probability density function (PDF) for particular values of mean and sd (assuming a normal distribution) can be calculated using dnorm. The CDF can be found using pnorm, and the inverse CDF using qnorm. These are 3 different ways of looking at the infrmation.
# p of observing a mean of 250ms when the true mean is 500 & sd = 100 (PDF)
dnorm(400,mean = 500, sd = 100)[1] 0.002419707
# p of observing 400ms *or lower* when the true mean is 500 & sd = 100 (CDF)
pnorm(400,mean = 500, sd = 100)[1] 0.1586553
# k with a CDF of 0.1586553 when the true mean is 500 & sd = 100 (inverse CDF)
qnorm(0.1586553, mean = 500, sd = 100)[1] 400
Question: what is the probability of observing values between 200 and 700 from a normal distribution where mean = 500 and sd = 100?
pnorm(700,500,100) - pnorm(200,500,100)[1] 0.9759
With continuous data, it is only meaningful to ask about probabilities between two point values (e.g., probability that Y lies between a and b).
What is the quantile q such that the probability of observing that value or something less (or more) than it is 0.975 (given the normal(500,100) distribution)?
qnorm(0.975, m=500, sd=100)[1] 695.9964
Next task: generate simulated data. generate 10 data points using the rnorm function and use this simulated data to compute the mean and stanrdard devaition.
x <- rnorm(10,500,100)
mean(x)[1] 600.5572
sd(x)[1] 78.86937
# can also computer lower and upper bounds of 95% CIs
quantile(x, probs = c(.025, .975)) 2.5% 97.5%
469.3127 720.3811
3.3.1 An important distinction: probability vs. densitiy in continuous random variables
The probability density function (PDF):
# density with default m = 0 and sd = 1
dnorm(1)[1] 0.2419707
This is not the probability of observing 1 in this distribution, as the probability of a single value in a continous distribtion will always be 0. This is becaue probability in a continuous distritubion is the area under the curve, and at a single point there is no area under the curve (i.e., p = 0). The pnorm function allows us to find the cumulative distribution function (CDF) for the normal distribution.
For example, the probability of obseving a value etween +/-2 in a normal distribution with mean 0 and sd 1:
pnorm(2, m = 0, sd = 1) - pnorm(-2, m = 0, sd = 1)[1] 0.9544997
For discrete random variables, the situation is different. These have a probability mass function (PMF), the binomial distribution that we saw before. Here, the PMF maps the possible y values to the probabilities of those exact values occurring.
dbinom(2,size=10,prob=.5)[1] 0.04394531
3.3.2 Truncating a normal distribution
Refers to positive values only (truncating at 0).
3.4 Bivariate and multivariate distributions
Consider a case where two discrete responses were recorded: a binary yes/no response, and a Likert acceptability rating (1-7).
The joint probability mass function is the joint PMF of two random variables.
Let’s play around with some such data:
# run if package is not loaded
# library(bcogsci)
data("df_discreteagrmt")3.4.0.1 Marginal distributions
The marginal distribution of each pair of values (let’s say x = the binary response, y = the Likert response) is computed by summing up
rowSums(probs)object probs is not defined in the book
3.4.1 Generate simulated bivariate (multivariate) data
Suppose we want to generate 100 pairs of correlated data, with correlation rho = 0.6. The two random variables have mean 0, and standard deviations 5 and 10 respectively.
## define a variance-covariance matrix:
Sigma <- matrix(c(5^2, 5 * 10 * .6, 5 * 10 * .6, 10^2),
byrow = FALSE, ncol = 2
)
## generate data:
u <- mvrnorm(
n = 100,
mu = c(0, 0),
Sigma = Sigma
)
head(u, n = 3) [,1] [,2]
[1,] 4.187286 8.903810
[2,] -12.405573 -7.363094
[3,] 2.121652 -6.667865
# plot the data
ggplot(tibble(u_1 = u[, 1], u_2 = u[, 2]), aes(u_1, u_2)) +
geom_point()3.5 An important concept: the marginal likelihood (integrating out a parameter)
4 Session Info
Compiled with R version 4.4.0 (2024-04-24) (Puppy Cup) in RStudio version 2023.3.0.386 (Cherry Blossom).
sessionInfo()R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] rstan_2.32.6 StanHeaders_2.32.7 rootSolve_1.8.2.4
[4] cmdstanr_0.7.1 pdftools_3.4.0 cowplot_1.1.3
[7] lme4_1.1-35.3 Matrix_1.7-0 gridExtra_2.3
[10] kableExtra_1.4.0 papaja_0.1.2 tinylabels_0.2.4
[13] bcogsci_0.0.0.9000 hypr_0.2.8 tictoc_1.2.1
[16] bayesplot_1.11.1 brms_2.21.0 Rcpp_1.0.12
[19] bridgesampling_1.1-2 loo_2.7.0 ggplot2_3.5.1
[22] extraDistr_1.10.0 purrr_1.0.2 tidyr_1.3.1
[25] dplyr_1.1.4 MASS_7.3-60.2
loaded via a namespace (and not attached):
[1] Rdpack_2.6 inline_0.3.19 sandwich_3.1-0
[4] rlang_1.1.3 magrittr_2.0.3 multcomp_1.4-25
[7] matrixStats_1.3.0 compiler_4.4.0 systemfonts_1.0.6
[10] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
[13] fastmap_1.1.1 backports_1.4.1 labeling_0.4.3
[16] effectsize_0.8.7 utf8_1.2.4 rmarkdown_2.26
[19] pracma_2.4.4 ps_1.7.6 nloptr_2.0.3
[22] xfun_0.43 jsonlite_1.8.8 parallel_4.4.0
[25] R6_2.5.1 stringi_1.8.4 boot_1.3-30
[28] estimability_1.5 knitr_1.46 zoo_1.8-12
[31] parameters_0.21.6 splines_4.4.0 tidyselect_1.2.1
[34] rstudioapi_0.16.0 abind_1.4-5 yaml_2.3.8
[37] codetools_0.2-20 processx_3.8.4 pkgbuild_1.4.4
[40] qpdf_1.3.3 lattice_0.22-6 tibble_3.2.1
[43] withr_3.0.0 bayestestR_0.13.2 askpass_1.2.0
[46] posterior_1.5.0 coda_0.19-4.1 evaluate_0.23
[49] survival_3.5-8 RcppParallel_5.1.7 xml2_1.3.6
[52] pillar_1.9.0 tensorA_0.36.2.1 checkmate_2.3.1
[55] renv_0.17.2 stats4_4.4.0 insight_0.19.10
[58] distributional_0.4.0 generics_0.1.3 rstantools_2.4.0
[61] munsell_0.5.1 scales_1.3.0 minqa_1.2.6
[64] xtable_1.8-4 glue_1.7.0 emmeans_1.10.1
[67] tools_4.4.0 mvtnorm_1.2-4 rbibutils_2.2.16
[70] QuickJSR_1.1.3 datawizard_0.10.0 colorspace_2.1-0
[73] nlme_3.1-164 cli_3.6.2 fansi_1.0.6
[76] viridisLite_0.4.2 svglite_2.1.3 Brobdingnag_1.2-9
[79] gtable_0.3.5 digest_0.6.35 TH.data_1.1-2
[82] htmlwidgets_1.6.4 farver_2.1.1 htmltools_0.5.8.1
[85] lifecycle_1.0.4